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Abstract. We consider a hidden Markov model with multiplicative noise 
emerging from studies of software reliability. We show the stability of the op- 
timal filter w. r. t. general initial conditions in the total variation- and L p -norm 
and deduce explicit rates. Remarkably, stability turns out to be independent 
of the ergodic behavior of the signal. 

1. Introduction 

The stability of nonlinear filters is a field of active research, see e.g. [3J, the 
introduction of [9J and references therein. However, the majority of results requires 
the signal process to be ergodic or stable in some sense. In addition, most of the 
results are obtained for signals observed with additive noise. The case of nonergodic 
signals and also the case of signals observed with multiplicative noise still remains 
mostly open. In this context, the present article studies the stability of the optimal 
filter in the following hidden Markov model: 

(1) Signal: X n = bX n _ x W n , 

(2) Observation: Y n = X^G n , n £ N, 

where W n , n G N, are independent identically Beta(a, /3)-distributed random vari- 
ables, a, ft > 0, describing the noise incorporated in the unknown signal, b is a 
positive parameter depending on which the unknown signal process can be ergodic 
or nonergodic and where G n , n £ N, are independent T(l, /3)-distributed. Hence, 
the observation Y n depends on X n via multiplication with the independent noise 
G n . Thus, model ([T]) and © is an example of filtering a signal observed with 
multiplicative noise. Note that although to logarithmize leads to a classical linear 
model with additive noise, stability cannot be studied immediately with known 
methods such as proposed in e. g. [8j [4] since the corresponding noise terms are 
rather irregular e. g. neither unimodal nor do they have light tails. 

Essentially, the above model appears in pQ as example for models admitting 
explicit invariant conditional distributions. In our case this amounts to the fact 
that the incorporated assumptions on the distributions of signal and observation 
together with a corresponding initial distribution, i. e. 
signal: (X n /6X n _i|X„_i) ~ Beta(a, ft) 

initial distribution: Xq ~ T(A, q), where q = a + ft and A > 
observation: (Y n \X n ) ~ T(X n , ft) 

imply the following explicit updating rules: 



2000 Mathematics Subject Classification. 93E11, 93E15, 60G35. 
Key words and phrases. Stability, Optimal filter, Multiplicative noise. 

1 



2 



BIR.GIT DEBRABANT AND WILHELM STANNAT 



posterior of X n : (X n \Y 1:n ) ~ T(A„, q), where Y 1:n = Yy, . . . , Y„ 

prior of X n+1 : (X„+i|Yi :n ) ~ r(A n /6,a) 

1-step ahead prediction: (bX n+ i/ X n \Y 1:n ) ~ Pearson Type VI 

with parameters /3 and a, cp. [5] 
posterior of (X n+ i\Yi :n+ i) ~ r(A„ + i,g), 

where A n+ i = -f- + 2/n+i 
To study software reliability, model ((T|) and © was later applied in [51 [2] as 
enhancement to the Kalman filter taking into account that failure data tends to 
be highly skewed and observational errors are not mainly caused by instrumental 
inaccuracies. Thereby, the observables Y n can be interpreted as interfailure times 
of some software. The X n play the role of unknown parameters steering their 
distribution. To model an evolution of the software the parameters evolve according 
to The value of b is typically unknown and indicates if we have a tendency 

of increasing reliability since e. g. E(Y n \Yi :n _i, Xq) = 2b~ n \ Xq + 53^=1 ^Yj) 
obviously tends to infinity if b < 1 . 

Our stability results give the dependence of the optimal filter 7r^ 1: ™, that is the 
regular conditional distribution of X n given the observations yi :n — (yk)k=i,...,n, 
on the initial distribution ttq of Xq. To cover a wider range of admissible initial 
conditions we extend the assumptions of [TJ [2] and suppose the initial distribution 
of Xq to be a mixed Gamma-distribution with the following density 

/>oo 

(3) tt q (x) cx x"- 1 / \ q e- Xx dU Q (\), x > 0, 



where q = a + (3 and Uq is a probability measure on a compact subset of (0,oo). 
Such a mixture conserves the conjugacy of this distribution, cp. Lemma [5] which 
shows that all the posterior distributions are of a similar type. 

The pure Gamma case, where Uq is a Dirac measure is easy to treat and will 
serve us as an introductory example, see Section[5J Provided that the unknown true 
initial condition tt is absolutely continuous w. r. t. the assumed initial condition ttq 
with a density bounded away from 0, we show stability in the total variation norm 
(almost surely given the observations) and in the L p -norm for arbitrary p > 
(expectation w. r. t. the observations Yi, Y%, . . .) both with explicit geometric rates. 

In Sections [3] and 2] we pass on to the general mixed Gamma initial condition. 
Under assumptions similar to those in Sec. [5] we show stability with geometric rates. 
Concerning the total variation norm, Theorem [5] gives that almost surely 

(4) = 0(6- n ), n^oo, 

for S < E(Wi )~^, which coincides with the pure Gamma case. Note that the rate 
of stability ^ is independent of the parameter b in the signal. That is, the filter 
is stable whether or not the signal is ergodic. However, the constant on the right 
hand side of ((4]) will depend on the given sequence y\,y2,... of observations. For 

the L p -norm with p e (o, -^■\nE(wf)) , Theorem H yields 

n — > oo, 



E(\\n^-n^\\P ar )=0(p n ), 



( R p \ P + P — — 

where p = IE(W^ )e"° j and where B = B(/3,uo,Oq) is a positive constant 

specified in the theorem. These rates are smaller compared to p = E(Wf) in the 
pure Gamma case. 
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All stability statements of this article are based on an universal stability result 
of [7]. Therein, a general bound for the total variation distance of optimal filters 
w. r. t. to the initial condition is deduced in terms of the Lipschitz contraction y* of 
a certain parabolic ground state transform P* associated with 7T^ 1:ra . More precisely 
suppose that the optimal filter is erroneously initialized with initial condition ttq 
and that ttq is the true initial condition. If ttq <C ttq with density ho = it 
follows that 7r^ 1: ™ <C ^n 1 " with density h n given by h n = P* o ■ ■ ■ o P^ho, n £ 

N, where P* f := — ^ — ^—r — ; - 'V. 1 ,, i , P resp. p is the dual operator of the 

transition probability P resp. the transition density p of the signal, that is P(X n+ i £ 

dx n +i\X n ) = p(X n , x n +i)dx n +i and Pf = J p(x, -)f(x)dx, and where g denotes 

the regular conditional density of the observation given the signal, i. e. P(Y n £ 

dy\X n ) — g(X ni y)dy. It follows that the error between the true optimal filter 

tt^ 1 '" and the erroneously initialized filter 7r^ 1: ™ can be expressed in terms of the 

Lipschitz contractions y* of the Markovian transition kernels P*, whereby %* := 
lip* f\\ Li 

sup " lp and where Lip is the space of all Lipschitz continuous functions 

f^Lip ||/||itp 

with the corresponding norm || • \ \lip '■ 

Proposition 1 (cp. Prop. 2.1 of [7]). For the total variation distance of the true 
optimal filter tt^ 1 '" to a wrongly initialized 7r^ 1: " the following explicit bound is valid: 

n 

(5) ||< 1: "-^ 1! "IU < ||- Uxl-\\ho\\u P , n£N, 

k=l 

provided that jto has TTQ-density ho which is bounded away from by a positive 

oaoo 

constant H and with a n = J J \x\ — x% \ dirl^ 1 '-™ (xi)d7r^ 1: " (^2) . 



Note that the formulation in [7] is more general and gives a bound also in the 
case H = 0. 

Clearly, for our model fl} and @ we obtain 

P(x,v) = ' y a ~Hbx - yf-%, bx ](y) and 

(fa/) 1-9 

p{x,dy) = p (y,x)dy= ^^—^x 01 ' 1 (by - xf^ 1 t [b -i Xi0o) (y)dy. 

2. Stability w. r. t. pure Gamma initial conditions 

We consider the particular case 7To(a:) oc x^ 1 \ q e~ Xx , x > 0, for some A > 0, 
hence Xq is r(A, (^-distributed. Using the recursion 

(6) ( x ) « 9(x,y n +i) J p(s,x)TT y n 1 - n (ds), 

it is straightforward to show that the corresponding optimal filters 7r^ 1:71 , n £ N, 
are again Gamma-distributed, namely L(A„, q) with A„ = b~ n \ + ^~ n Vj- 

To study the asymptotic dependence of the optimal filter on the initial condition 
we apply Prop. [Tj Since 

(7) P* n (x,dy) = ^i(»-6- 1 a J )^- 1 e- A - 1 (»- 6 " I »)l (6 -x s , oo) (y)di/, 
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we find ||P*||i* p = n G N. Further 



On < I / / (a:i -x 2 ) 2 dT(X n ,q)(x 1 )dT(X n ,q)(x 2 ) I = neN. 



\0 

Prop. [T] then implies 

(8) \\<--^\Uar < ^^ (b n X n )-\ Tl G N, 

if the true initial condition 7To has a 7To-density which is bounded away from by 
some constant H > 0. 

The following lemma analyses the limiting behavior of b n X n = A + Y^j=i ^Vj 
which evidently plays a crucial role concerning stability: 

Lemma 2. P(liminf^ 00 {6 : 'Yj > <5 J }) = 1 for arbitrary 5 < E(wf . In partic- 
ular, y^jLj = +oo almost surely. 

Proof. We may assume S > 0. Note that 

p(^<^') = E [v(k)J v f> ~ le ~ XiV '*y) 

= (B(wf)) j E(^) 

is summable for 5 < E (wf ^ ? . Hence, the Borel-Cantelli lemma yields 

P(\immi :j ^ oc {b j Yj > S j }) = 1. Since S(Wf) - '? > 1, we can choose 5 > 1 which 
implies that Ylj=i diverges to infinity almost surely. □ 

Remark 3. Lemma\^remains valid for any initial condition ttq satisfying E(Xq) < oo. 

Concerning almost sure stability, Lemma [2] implies that (jSJ converges to almost 
surely and behaves as 0(5~ n ) for every S < E(Wf)~~P ■ 

Moreover, concerning stability in the L p -norm, for < p < [3 we find 

x£ 



- ~Y(pr E{ib Xn) ' - — m — ( i] ' 

n G N. Therefore, the optimal filter is stable in the corresponding L p -norm and 
satisfies 

^(Ik» l! "-*« l!n ll5«r) = ®(P n ), n^oo, 
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3. Stability in the total variation norm 

We return to the more general mixed Gamma initial conditions of the form (j3]) 
and deduce an estimate of the total variation distance of 7r^ 1:n w. r. t. differing initial 
conditions in Thm. 0] which then implies almost sure stability with geometric rates, 
cp. Thm. El 

Theorem 4. Let 7To and fro be such that 

• tto(x) oc x q ~ x Jg 00 X 9 e~ Xx dUo(X) for some probability measure Uq with sup- 
port [uo,oq] C (0, oo), and 

• 7To is a probability density on R + such that the corresponding density ho := 
2£ is Lipschitz continuous and ho > H > for some constant H . 

Let yi [n S K" and let 7r^ 1: ™ resp. 7r^ 1: ™ be the optimal filter w. r. t. ttq resp. ttq. 
Then 




neN, 



wh ereQ = ^^andB = l{f3{p + l)-^ + ^^)\ 
This estimate induces stability with geometric rates: 

Theorem 5. Under the assumptions of Theorem^ the optimal filter is stable for 
almost all sequences 2/1,2/2, ■ • • of observations and we have 

(11) IK 1: "-^ 1: "IU = 0(S~ n ), n^oo, 

for every S < E(wfy^. 

Proofs and preliminary results. Before proving the above main statements we con- 
sider two preliminary results. Firstly note that posterior distributions correspond- 
ing to an initial condition of the form ((3]) are again of mixed Gamma type: 

Lemma 6. The posterior distributions 7r^ 1: ™ are mixed Gamma- distributions with 
densities 

POO 

■K y n Un {x) oc x^ 1 / \ q e- Xx dU n {\), x>0. 
Jo 

Hereby, the distributions U n , netf, obey the following recursive scheme: 

. U { n\d\) =\ a U n (d\), 

. U n 2) (A) = U { n\b ■ A) for A e B(R+), 

(3) (2) 

• Un — S Vn+1 -kUn , where Sy n+1 denotes the Dirac measure in y n +i and * 
the convolution, 

• U n+1 (d\) = \-iU n 3) (d\). 

Remark 7. For Uq = 6\ with some A > we obtain U n = <5{,-n a+vj™ 1 b k -™y k ■ This 
setup corresponds to the pure Gamma case of Section^ 

If Uq has compact support [uo,Oq] then U n has also a compact support [u ni o n ] 
with u n = b~ n u + YX=l bk ~ n Vk an d o n = b~ n o + Z)£ =1 b k ~ n y k , n £ N. 
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Proof. The recursion (j6]) implies 

<Ti +1 (dx) 

oo 



(X X 



$ e - xyn+1 / {bsY^x^ibs-xf- 1 -s^ 1 / X q e- Xs dU n (X)ds 

Jb~ 1 x JO 



oo 

-1„ 



oc x q - x e- xy ^ \ X a e~ xb x dU n {X) 



o 

j* oo poo 

= x i-^ e -^+i / e- Xb ~ lx dU£\\) = x^e^^ 1 / e- Xx dU^ ] {X) 
Jo Jo 

/■oo />oo 

= x^ 1 e- Xx dU^\\) = x"- 1 / X q e- Xx dU n+1 (X), x > 0. 
Jo Jo 

□ 

In order to apply Prop. Q] we deduce an upper bound for the contraction coeffi- 
cient Xn of t ne ground state transform P* : 

Proposition 8. Let f : K + — > K be Lipschitz- continuous. Under the assumptions 
of Thm. [2] we find 



with 



IP*, I, < ±su Px>0 d n {x) + l 

\^nJ\\Up < 7 WlW Lip 



f [3(13 + 1) 2(3 2 , (908 + 1) 



t^Car) = / / |Ai - A a | ' - ^- + ^ ' ) dI^_ 1 (Ai)d£^_ 1 (A 2 ) 



o o 
and 



/A Q e~ Ab !C d[/ n _ 1 (A) 



Moreover 



3apdM <(<±±-i) Up + 1) - 2*2 + ffi±M) ' , n e N. 

x>0 /V °0 u J 

Notations: To simplify notations in the following proof we introduce for x, y > 
and n S N the following measures derived from U n : 

dU x (X) - \ a e~ xb ~ lx dU n (\) 



dU x ^(\) = ^LyP-i e -*dUZ(\), 
dU n (X,y) = dU x ^(X)dy. 
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Moreover, note the following estimate via Jensen's inequality 

coco 

J f \vi-V2\dT{\ u (i)( yi )dT(\ 2 ,(i)( V2 ) 

J J (y 1 -y 2 ) 2 dr(X u p)(y 1 )dr(X 2 ,p)(y 2 ) 



o o 



\0 



(12) 



( 0(J3 + 1) 2/3 2 | /3Q8 + 1) 



V \\ " X1X2 \\ 



Proof. Let n e N. Due to the structure of the optimal filter we obtain for x, y > 

, J M e-^( r t-T 1 Wt/-i(A) , 

Pn(x,V) = wm r oo ^ n __^ h -i rJTT — l(6-ix,oo)(l/)- 



r( / 8)/ oo A°e-^- 1 ^ n _ 1 (A) 



For xi, x 2 > we find 

OOd) - P n */(x 2 ) 



xi 



6 



x 2 



fb+l 1 -f[y+^ P*n[X2,V+-?)dy 



X 2 



= :Ti 



+ 



l f ( y+ t) V n ( xi > v+ t)- p » y + ¥)] dy ■ 



--■■T 2 



As x h-> p* (x, y + 6 x x) is differentiable with 
d / x\ 

Tx v - ( x ' v+ b) 

J™-M- 1 e- x (y +b - lx W- 1 X'dUn-iW 

J oo A« e -^- 1 -d;7„_ 1 (A)r(/3) 



(J^A^e-^-^^^A))^^) 



A 



dU^ 1 (\)+ dU^(\) 



oo />oo 



JO 



A 



dUn-i{X)dy, 



for T 2 we find 



= //i f ( y+x i)h^ y+ l) dydx - 



■T 3 



s 
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Further we have 

oooo 



T 3 = jj f{y+^)±dU*_ x {\ y ) 



oooo oooo 

-/ //(» + y)d^_i(A,y)- J J ~d££_i(A,y) 





oooo oooo 



2 

b b b 6 

<fE/£_i(Ai,j/i)dE/£_i(A 2) ife), 
since C/^_i is a probability measure on [0,oo) 2 . Therefore, 

CO OO CO CO 

|T 3 | < ////|yi -Jfal • |Aa - A 2 |dC^_ 1 (A 1 ,y 1 )dC^_ 1 (A 2 , Iftl ) 



oooo 

oooo oooo 



////'" ^^r^- 1 ^"-*'-^ 



oooo 



■|A 1 -A 2 |d{/^ 1 (A 1 )d(7^ 1 (A 2 ) 

and formula (fT2j) yields 



|T 3 | < life 77 l Al ~ A2 l fa | i) 2/3 2 Ax + l)A^ 4 



< 



2b J J Ai V A 2 A 2 

o o 



dt^_ 1 (Ai)dl/-_ 1 (A 2 ) 



iLfe f^zi _ fa + 1) _ ^ + W+M) 



= :2B 

since 1 < 2a. < 2sl and 2a J, 1 due to Lemma [5] For T 2 we finally have 

|T 2 | < \ Xi - X2 \MMe(2^1_^\ b 

b \U n -i J 

which together with the obvious estimate |Ti| < ^^ L ' p • \xi — x 2 | yields 

l+ (o 3= ±_ 1 \ B 
\\Pnf\\L V < ^ '— H/IUip. 



Now we can proceed with the proofs of Theorems @] and [5] 
Proof, of Thm. @] We apply Prop. [T] Due to Prop. [5] we have 
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For (X„ we find 



oooooooo 



Iff J |*1 - x 2 \dT{X u q){x 1 )dT{X 2 ,q){x 2 )dU n {X 1 )dU n {X 2 ) 




< III ~ ^ + q -^du n{ x l)d u n (x 2 ) 



X 1 A1A2 A 



< u- 1 (2q(q+l))^ 

due to (IT^t . Altogether this yields (|10p and proves the theorem. □ 
Proof, of Thm. [3] 

We show that (6"u„) _1 rife=o ^1 + -^ "b^"" ) vaiusnes almost surely if n goes to 
infinity: 

First note that 



n— 1 / \ ( n — 1 

U{ 1 + B l^) * exp 5( O0 -. )^(6 



h u k )- 1 



fc=0 v 7 k k=0 

since 1 + x < exp a; for x £ R. 

Remember that b k Uk = uq + Y2j=i ^Vi- -Due to Lemma|]we can find 5 > 1 and 
a random index Jg a. s. finite such that Yj > ((Jfr -1 )-? for all j > J$. Consequently, 
there is a js S N with 

ng (i + **g) ^pjgEfc^+^Eg^} 

where B — B(oq — uq). The right hand side of (|13l) can be bounded for n > by 

exp^jexp^} 

J™ 

which tends to as (9(<5 _n ) for n — > oo. □ 

4. Stability in the L p -norm 

In the setting of Theorem [4] the optimal filter is always stable in the L p -norm 
for every p > since the total variation norm is bounded by 1 and so almost sure 
convergence of the optimal filter implies 

E{\\^ n -^ n \\ p var ) -+0, n->oo, 

by dominated convergence. 

Moreover, for some values of p we can specify the rates of this convergence: 

Theorem 9. For p e (o,-^lnE(W?f) we find 



w/iere p = f.E(Wf )e^r) and fl = + i) 
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Note that the theorem achieves lower rates than those in the pure Gamma case 
since 

p 

r /js eM\ p+h 



> B(Wf)5+? > E{Wl Tn ) > E{Wf). 



The proof is based on Theorem [5] and the bound (p~4|) for the total variation 
distance. Consider first the following Lemma specifying the behavior of the random 
index J$ introduced in the proof of Thm. [5] 

Lemma 10. Let < 6 < E(W? )~? . For 

Js = inf{fc £ N : VYj > <P for all j > k} 

we find 



(15) 



P(J S > n) < 



n £ N , 



r(p + i)(i- q y ' 

where q = 5 f3 E(W^). 

Proof. Using <j9j) we have for n £ No 

oo 

P(J S >n) = P (U-Jy^ < <P}) < J2 P ^ Sj ) 



EX, 



< 



r(/3 + l) l-q 



□ 



Proof, of Thm. U First note that for s > 

OO 

£ (e sJ *) = 1 + (e s - 1) e sn P{J s > n). 



n=0 



Due to Lemma [TU1 the expectation is finite if e s q < 1, where q = 5^E{W^). 



Now let 5 = (E(W?)e^ " +f . 
Lemma [10] applies and we obtain from ([14)) that 

E(\\i&"-i&>»\\?.. 



Then S satisfies 1 < 5 < E(W{)~*. Now, 



n I \var I 



< Q\\h \\ Up E le pB ^ e " 



■1 



pB 

— n n 



{J s <n}+e u ° HJ s >n} 



e PB T 



-1 ^ — L P "0 



§pn 



e — n P(J s > n) 



J 



< Q\\h \ 



Lip 



E 



V 



r(/? + i) 



e "o q" 



which tends to as 0(p n ) since e u o q = 5 p = p and p < 1 due to the assumptions 
on p. □ 
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Note that in this proof the choice <5 = yE{W![ )e u ° J is optimal as the two 

components of the above bound are directed opposite to one another and the actual 
S is chosen such that their behavior merges. 

References 

[1] J. A. Bather. Invariant conditional distributions. Ann. Math. Stat., 36:829-846, 1965. 

[2] Yiping Chen and Nozer D. Singpurwalla. A non-Gaussian Kalman filter model for tracking 

software reliability. Stat. Sin., 4(2):535-548, 1994. 
[3] Dan Crisan (ed.) and Boris Rozovskii (ed.). The Oxford handbook of nonlinear filtering. Oxford: 

Oxford University Press, 2011. 
[4] Randal Douc, Eric Moulines, and Yaacov Ritov. Forgetting of the initial condition for the 

filter in general state-space hidden Markov chain: a coupling approach. Electron. J. Probab., 

14:27-49, 2009. 

[5] Norman L. Johnson, Samuel Kotz, and N. Balakrishnan. Continuous univariate distributions. 

Vol. 2. 2nd ed. New York, NY: Wiley, 1995. 
[6] Nozer D. Singpurwalla and Simon P. Wilson. Statistical methods in software engineering. 

Springer Series in Statistics. Springer- Verlag, New York, 1999. Reliability and risk. 
[7] Wilhelm Stannat. Stability of the optimal filter for nonergodic signals - a variational approach. 

Oxford: Oxford University Press, 2011. 
[8] Ramon van Handel. Discrete time nonlinear filters with informative observations are stable. 

Electron. Commun. Probab., 13:562-575, 2008. 
[9] Ramon van Handel. Uniform observability of hidden Markov models and filter stability for 

unstable signals. Ann. Appl. Probab., 19(3):1172-1199, 2009. 



